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1 RECOMBINANT 21 kD COCOA PROTEIN AND PRECURSOR 

2 

3 This invention relates to proteins and nucleic acids derived from or otherwise 

4 related to cocoa. 
5 

6 The beans of the cocoa plant (Theobroma cacao) are the raw material for cocoa, 

7 chocolate and natural cocoa and chocolate flavouring. As described by Rohan 

8 ("Processing of Raw Cocoa for the Market", FAO/UN (1963)), raw cocoa 

9 beans are extracted from the harvested cocoa pod, from which the placenta is 

10 normally removed, the beans are then "fermented" for a period of days, during 

1 1 which the beans are killed and a purple pigment is released from the cotyledons. 

12 During fermentation "unknown" compounds are formed which on roasting give 

13 rise to characteristic cocoa flavour. Rohan suggests that polyphenols and 

14 theobromine are implicated in the flavour precursor formation. After 

15 fermentation, the beans are dried, during which time the characteristic brown 

16 pigment forms, and they are then stored and shipped. 
17 

18 Biehl et a/, 1982 investigated proteolysis during anaerobic cocoa seed 

19 incubation and identified 26kD and 44kD proteins which accumulated during 

20 seed ripening and degraded during germination. Biehl asserted that there were 

21 storage proteins and suggested that they may give rise to flavour-specific 

22 peptides. 
23 

24 Biehl et al. y 1985 again asserted that amino acids and peptides were important 

25 for flavours. 
26 

27 Fritz et al 9 1985 identified polypeptides of 20kD and 28kD appearing in the 

28 cytoplasmic fraction of cocoa seed extracts at about 100 days after pollination. 

29 It appears that the 20kD protein is thought to have glyceryl acyltransferase 

30 activity. 
31 

32 Pettipher et at, 1990 suggested that peptides are important for cocoa flavour 

33 and refers to 48kD and 28kD storage proteins. 



WO 91/19800 PCT/GB91/00913 



1 

2 In spite of the uncertainties in the art, as summarised above, proteins apparently 

3 responsible for flavour production in cocoa beans have now been identified. 

4 Further, it has been discovered that, in spite of Fritz's caution that "cocoa seed 

5 mRNA levels are notably low compared to other plants" (he. at.), it is possible 

6 to apply the techniques of recombinant DNA techniques to the production of 

7 such proteins. 
8 

9 According to a first aspect of the invention, there is provided a 23kD protein of 

10 Th. cacao or a fragment thereof. 
11 

12 The 23kD protein may be processed in vivo to form a 21KD polypeptide 
13 

14 According to a second aspect of the invention, there is provided a 21kD protein 

15 of Th. cacao or a fragment thereof. 
16 

17 The term "fragment" as used herein and as applied to proteins or peptides 

1 8 indicates a sufficient number of amino acid residues are present for the fragment 

19 to be useful. Typically, at least four, five, six or even at least 10 or 20 amino 

20 acids may be present in a fragment. Useful fragments include those which are 

21 the same as or similar or equivalent to those naturally produced during the 

22 fermentation phase of cocoa bean processing. It is believed that such fragments 

23 take part in Maillard reactions during roasting, to form at least some of the 

24 essential flavour components of cocoa. 
25 

26 Proteins in accordance with the invention may be synthetic; they may be 

27 chemically synthesised or, preferably, produced by recombinant DNA 

28 techniques. Proteins produced by such techniques can therefore be termed 

29 "recombinant proteins". Recombinant proteins may be glycosylated or 

30 non-glycosylated; non-glycosylated proteins will result from prokaryotic 

31 expression systems. 
32 

33 
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1 Theobroma cacao has two primary subspecies, Th. cacao cacao and Th. cacao 

2 sphaerocarpum. While proteins in accordance with the invention may be 

3 derived from these subspecies, the invention is not limited solely to these 

4 subspecies. For example, many cocoa varieties are hybrids between different 

5 species; an example of such a hybrid is the tiinitario variety. 
6 

7 The invention also relates to nucleic acid, particularly DNA, coding for the 

8 proteins referred to above (whether the primary translation products, the 

9 processed proteins or fragments). The invention therefore also provides, in 
10 further aspects: 

11 

12 - nucleic acid coding for a 23kD protein of Th. cacao or for a 

13 fragment thereof; and 
14 

15 - nucleic acid coding for a 21kD protein of Th. cacao or for a 

16 fragment thereof. 
17 

18 Included in the invention is nucleic acid which is degenerate for the wild type 

19 protein and which codes for conservative or other non-deleterious mutants. 

20 Nucleic acid which hybridises to the wild type material is also included. 
21 

22 Nucleic acid within the scope of the invention will generally be recombinant 

23 nucleic acid and may be in isolated form. Frequently, nucleic acid in 

24 accordance with the invention will be incorporated into a vector (whether an 

25 expression vector or otherwise) such as a plasmid. Suitable expression vectors 

26 will contain an appropriate promoter, depending on the intended expression 

27 host. For yeast, an appropriate promoter is the yeast pyruvate kinase (PK) 

28 promoter; for bacteria an appropriate promoter is a strong lambda promoter. 
29 

30 
31 
32 
33 
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1 Expression may be secreted or non-secreted. Secreted expression is preferred, 

2 particularly in eukaryotic expression systems; an appropriate signal sequence 

3 may be present for this purpose. Signal sequences derived from the expression 

4 host (such as that from the yeast alpha-factor in the case of yeast) may be more 

5 appropriate than native cocoa signal sequences. 
6 

7 The invention further relates to host cells comprising nucleic acid as described 

8 above. Genetic manipulation may for preference take place in prokaryotes. 

9 Expression will for preference take place in a food-approved host The yeast 
10 Saccharomyces cerevisiae is particularly preferred. 

11 

12 The invention also relates to processes for preparing nucleic acid and protein as 

13 described above by nucleic acid replication and expression, respectively 
14 

15 cDNA in accordance with the invention may be useful not only for obtaining 

16 protein expression but also for Restriction Fragment Length Polymorphism 

17 (RFLP) studies. In such studies, detectably labelled cDNA (eg radiolabeled) is 

18 prepared. DNA of a cultivar under analysis is then prepared and digested with 

19 restriction enzymes. Southern blotting with the labelled cDNA may then enable 

20 genetic correlations to be made between cultivars. Phenotypic correlations may 

21 then be deduced. 
22 

23 The invention will now be illustrated by the following non-limiting examples. 

24 The examples refer to the accompanying drawings, in which: 
25 

26 Figure 1 shows a map of a full length cDNA clone hybridising with an 

27 oligonucleotide probe for the 21kD protein, together with the regions covered 

28 by DNA sequencing; 
29 

30 Figure 2 shows the DNA sequence of cDNA coding for the 21kD protein and 

31 the presumed amino-acid sequence of the encoded 23 kD precursor; 
32 

33 
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1 Figure 3 shows the relationship between the 21kD protein and trypsin inhibitors 

2 from other plants; 
3 

4 Figure 4 shows a map of plasmid pJLA502; 
5 

6 Figure 5 shows two yeast expression vectors useful in the present invention; 

7 vector A is designed for internal expression and vector B is designed for 

8 secreted expression; 
9 

10 Figure 6a shows, in relation to vector A, part of the yeast pyruvate kinase gene 

11 showing the vector A cloning site, and the use of Hin-Nco linkers to splice in 

12 the 21kD gene; 
13 

14 Figure 6b shows, in relation to vector B, part of the yeast alpha-factor signal 

15 sequence showing the vector B cloning site, and the use of Hin-Nco linkers to 

16 create an in-phase fusion; and 
17 

18 Figure 7 shows a map of plasmids pMY9 and pMYlO, referred to in Example 

19 16. 
20 

21 EXAMPLES 
22 

23 Example \ 
24 

25 Identification of the Major Seed Proteins 
26 

27 It is not practicable to extract proteins directly from cocoa beans due to the high 

28 fat and polyphenol contents, and proteins were, therefore, extracted from 

29 acetone powders made as follows. Mature beans from cocoa of West African 

30 origin {Theobroma cacao amelonada) were lyophilised and ground roughly in a 

31 pestle and mortar. Lipids were extracted by Soxhlet extraction with diethyl 

32 ether for two periods of four hours, the beans being dried and further ground 
33 
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1 between extractions. Polyphenols and pigments were then removed by several 

2 extractions with 80% acetone, 0.1% thioglycolUc acid. After extraction the 

3 resulting paste was dried under vacuum and ground to a fine powder. 



10 
11 



4 

5 Total proteins were solubilised by grinding the powder with extraction buffer 

6 (0.05 M sodium phosphate, pH 7.2; 0.01 M 2-mercaptoethanol; 1% SDS) in a 

7 hand-held homogeniser, at 5mg/ml. The suspension was heated at 95°C for 5 

8 minutes, and centrifuged at 18 K for 20 minutes to remove insoluble material. 

9 The resulting clear supernatant contained about 1 mg/ml total protein. 
Electrophoresis of 25 /d on an SDS-PAGE gel (Laemmli, 1970) gave three 
major bands, including one at 21 kD, comprising approximately 30% of the 

12 total proteins. The 21 kD protein is presumed to be the polypeptide subunit of a 

13 major storage protein. 
14 

15 Characteristics of the Storage Polypeptide 
16 

17 The solubility characteristics of the 21 kD polypeptide was roughly defined by 

18 one or two quick experiments. Dialysis of the polypeptide solution against 

19 SDS-free extraction buffer rendered some polypeptides insoluble, as judged by 

20 their ability to pass through a 0.22 micron membrane, whereas the 21 kD 

21 polypeptide remained soluble. Only the 21 kD polypeptide was extracted from 

22 the acetone powder by water and dilute buffers, showing that this protein could 

23 be classed as an albumin. 
24 

25 Purification of the major polypeptide 
26 

27 The 21 kD polypeptide was purified by two rounds of gel filtration on a 

28 SUPEROSE-12 column of the PHARMACIA Fast Protein Liquid 

29 Chromatography system (FPLC), or by electroelution of bands after preparative 

30 electro- phoresis. (The words SUPEROSE and PHARMACIA are trade marks.) 

3 1 Concentrated protein extracts were made from 50 mg acetone powder per ml of 

32 extraction buffer, and 1-2 ml loaded onto 2 mm thick SDS-PAGE gels poured 

33 without a comb. After electrophoresis the gel was surface stained in aqueous 
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1 Coomassie Blue, and the major bands cut out with a scalpel. Gel slices were 

2 electroeluted into dialysis bags in electrophoresis running buffer at 15 V for 24 

3 hours, and the dialysate dialysed further against 0.1% SDS. Samples could be 

4 concentrated by lyophilisation. 
5 

6 Example 2 
7 

8 Aminchacid Sequence Data from Protein 
9 

10 Protein samples (about 10 pg) were subjected to conventional N-terminal 

11 amino-acid sequencing. A 12 amino-acid sequence was obtained for the 21 kD 

12 protein, and this information was used to construct an oligonucleotide probe 

13 (Woods et al, 1982; Woods, 1984). 
14 

15 Example 3 
16 

17 Raising Antibodies to the 21 kD Polypeptide 
18 

19 Polyclonal antibodies were prepared using the methodology of Catty and 

20 Raykundalia (1988). The serum was aliquoted into 1 ml fractions and stored at 

21 -20°C. 
22 

23 Characterising Antibodies to the 21 kD Polypeptide 
24 

25 Serum was immediately characterised using the Ochterloney double-diffusion 

26 technique, whereby antigen and antibody are allowed to diffuse towards one 

27 another from wells cut in agarose in borate-saline buffer. Precipitin lines are 

28 formed where the two interact if the antibody 'recognises 1 the antigen. This test 

29 showed that antibodies to the 21 kD protein antigen had been formed. 
30 

31 The gamma-globulin fraction of the serum was partially purified by 

32 precipitation with 50% ammonium sulphate, solubilisation in 

33 phosphate-buffered saline (PBS) and chromatography on a DE 52 cellulose 
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1 ion-exchange column as described by Hill, 1984. Fractions containing 

2 gamma-globulin were monitored at 280 nm (OD^q of 1.4 is equivalent to 1 

3 mg/ml gamma-globulin) and stored at -20°C. 
4 

5 The effective titre of the antibodies was measured using an enzyme-linked 

6 immunosorbant assay (ELBA). The wells of a polystyrene microtitre plate 

7 were coated with antigen (10-1000 ng) overnight at 4°C in carbonate coating 

8 buffer. Wells were washed in PBS-Tween and the test gamma globulin added at 

9 concentrations of 10, 1 and 0.1 /ig/ml (approximately 1:100, 1:1000 and 

10 1:10,000 dilutions). The diluent was PBS-Tween containing 2% polyvinyl 

11 pyrrolidone (PVP) and 0.2% BSA. Controls were preimraune serum from the 

12 same animal. Binding took place at 37°C for 3-4 hours. The wells were 

13 washed as above and secondary antibody (goat anti-rabbit IgG conjugated to 

14 alkaline phosphatase) added at a concentration of 1 Mg/ml, using the same 

15 conditions as the primary antibody. The wells are again washed, and alkaline 

16 phosphatase substrate (p-nitrophenyl phosphate; 0.6 mg/ml in diethanol-amine 

17 buffer pH 9.8) added. The yellow colour, indicating a positive reaction, was 

18 allowed to develop for 30 minutes and the reaction stopped with 3M NaOEL 

19 The colour is quantified at 405 nm. More detail of this method is given in Hill, 

20 1984. The method confirmed that the antibodies all had a high title and could 

21 be used at 1 fig/ ml concentration. 
22 

23 Example 4 
24 

25 Isolation of Total RNA from Immature Cocoa Beans 
26 

27 The starting material for RNA which should contain a high proportion of 

28 mRNA specific for the storage proteins was immature cocoa beans, at about 130 

29 days after pollination. Previous work had suggested that synthesis of storage 

30 proteins was approaching its height by this date (Biehl et al, 1982). The beans 

31 are roughly corrugated and pale pinkish-purple at this age. 
32 

33 
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1 The initial requirement of the total RNA preparation from cocoa beans was that 

2 it should be free from contaminants, as judged by the UV spectrum, particularly 

3 in the far UV, where a deep trough at 230 nm (260 nm : 230 nm ratio is 

4 approximately 2.0) is highly diagnostic of clean RNA, and is intact, as judged 

5 by agarose gel electrophoresis of heat-denatured samples, which should show 

6 clear rRNA bands. A prerequisite for obtaining intact RNA is scrupulous 

7 cleanliness and rigorous precautions against RNases, which are ubiquitous and 

8 extremely stable enzymes. Glassware is customarily baked at high 

9 temperatures, and solutions and apparatus treated with the RNase inhibitor 
10 diethyl pyrocarbonate (DEPC, 0. 1 %) before autoclaving. 

11 

12 The most routine method for extraction of plant (and animal) RNA is extraction 

13 of the proteins with phenol/chloroform in the presence of SDS to disrupt 

14 protein-nucleic acid complexes, and inhibit the RNases which are abundant in 

15 plant material. Following phenol extraction the RNA is pelletted on a caesium 

16 chloride gradient before or after ethanol precipitation. This method produced 

17 more or less intact RNA, but it was heavily contaminated with dark brown 

18 pigment, probably oxidised polyphenols and tannins, which always co-purified 

19 with the RNA. High levels of polyphenols are a major problem in Theobroma 

20 tissues. 
21 

22 A method was therefore adopted which avoided the use of phenol, and instead 

23 used the method of Hall et aL (1978) which involves breaking the tissue in hot 

24 SDS-boiate buffer, digesting the proteins with proteinase K, and specifically 

25 precipitating the RNA with LiCl. This method gave high yields of reasonably 

26 clean, intact RNA. Contaminants continued to be a problem and the method 

27 was modified by introducing repeated LiCl precipitation steps, the precipitate 

28 being dissolved in water and clarified by microcentrifugation after each step. 

29 This resulted in RNA preparations with ideal spectra, which performed well in 

30 subsequent functional tests such as in vitro translation. 
31 

32 
33 
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1 Preparation ofmKNA From Total RNA 
2 

3 The mRNA fraction was separated from total RNA by affinity chromatography 

4 on a small (1 ml) oligo-dT column, the mRNA binding to the column by its 

5 poly A tail. The RNA (1-2 mg) was denatured by heating at 65°C and applied 

6 to the column in a high salt buffer. Poly A+ was eluted with low salt buffer, 

7 and collected by ethanol precipitation. The method is essentially that of Aviv 

8 and Leder (1972), modified by Maniatis et al (1982). From 1 mg of total 

9 RNA, approximately 10-20 ng polyA+ RNA was obtained (1-2%). 
10 

11 In vitro Translation of mRNA 
12 

13 The ability of mRNA to support in vitro translation is a good indication of its 

14 cleanliness and intactness. Only mRNAs with an intact polyA tail (3* end) will 

15 be selected by the oligo-dT column, and only mRNAs which also have an intact 

16 5' end (translational start) will translate efficiently. In vitro translation was 

17 carried out using RNA-depleted wheat-germ lysate (Amersham International), 

18 the de novo protein synthesis being monitored by the incorporation of [ 35 

19 Shmethionine (Roberts and Paterson, 1973). Initially the rate of de novo 

20 synthesis was measured by the incorporation of [ 35 S]-methionine into 

21 TCA-precipitable material trapped on glass fibre filters (GFC, Whatman). The 

22 actual products of translation were investigated by running on SDS-PAGE, 

23 soaking the gel in fluor, drying the gel and autoradiography. The mRNA 

24 preparations translated efficiently and the products covered a wide range of 

25 molecular weights, showing that intact mRNAs for even the largest proteins had 

26 been obtained. None of the major translation products corresponded in size to 

27 the 21 kD polypeptide identified in mature beans, and it was apparent that 

28 considerable processing of the nascent polypeptide must occur to give the 

29 mature form. 
30 

31 
32 
33 
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1 Example 5 

2 

3 Identification of Precursor to the Mature Polypeptide by lmmunoprecipitation 
4 

5 Because the 21 kD storage polypeptides was not apparent amongst the 

6 translation products of mRNA from developing cocoa beans, the technique of 

7 immunoprecipitation, with specific antibodies raised to the 21 kD polypeptide, 

8 was used to identify the precursors from the translation mixture. This was done 

9 for two reasons: first to confirm that the appropriate mRNA was present before 

10 cloning, and second to gain information on the expected size of the encoding 

11 gene. 
12 

13 Immunoprecipitation was by the method of Cuming et al 9 1986. [ 35 S]-labelled 

14 in vitro translation products were dissociated in SDS, and allowed to bind with 

15 specific antibody in PBS plus 1 % BSA. The antibody-antigen mixture was then 

16 mixed with protein A-SEPHAROSE and incubated on ice to allow the IgG to 

17 bind to protein A. The slurry was poured into a disposable 1 ml syringe, and 

18 unbound proteins removed by washing with PBS +1% NONIDET P-40. The 

19 bound antibody was eluted with 1M acetic acid and the proteins precipitated 

20 with TCA. The antibody-antigen complex was dissociated in SDS, and subject 

21 to SDS-PAGE and fluorography, which reveals which labelled antigens have 

22 bound to the specific antibodies. 
23 

24 The results showed that the anti-21 kD antibody precipitated a 23 kD precursor. 

25 The precursor size corresponded to a major band on the in vitro translation 

26 products. 
27 

28 
29 
30 
31 
32 
33 
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l Sample 6 
2 

3 cDNA Synthesis From the mRNA Preparations 
4 

5 cDNA synthesis was carried out using a kit from Araersham International. The 

6 first strand of the cDNA is synthesised by the enzyme reverse transcriptase, 

7 using the four nucleotide bases found in DNA (dATP, dTTP, dGTP, dCTP) and 

8 an oligo-dT primer. The second strand synthesis was by the method of Gubler 

9 and Hoffman (1983), whereby the RNA strand is nicked in many positions by 

10 RNase H, and the remaining fragments used to prime the replacement synthesis 

11 of a new DNA strand directed by the enzyme E. coli DNA polymerase I. Any 

12 3' overhanging ends of DNA are filled in using the enzyme T4 polymerase. 

13 The whole process was monitored by adding a small proportion of [ 32 P]-dCTP 

14 into the initial nucleotide mixture, and measuring the percentage incorporation 

15 of label into DNA. Assuming that cold nucleotides are incorporated at the same 

16 rate, and that the four bases are incorporated equally, an estimate of the 

17 synthesis of cDNA can be obtained. From 1 pg of mRNA approximately 140 

18 ng of cDNA was synthesised. The products were analysed on an alkaline 1 4% 

19 agarose gel as described in the Amersham methods. Globin cDNA, synthesised 

20 as a control with the kit, was run on the same gel, which was dried down and 

21 autoradiographed. The cocoa cDNA had a range of molecular weights, with a 

22 substantial amount larger than the 600 bp of the globin cDNA. 
23 

24 Example 7 
25 

26 Cloning ofcDNA into a Plasmid Vector by Homopolymer Tailing 
27 

28 The method of cloning cDNA into a plasmid vector was to 3' tail the cDNA 

29 with dC residues using the enzyme terminal transferase (Boehringer Corporation 

30 Ltd), and anneal into a Pstl-cut and 5* tailed plasmid (Maniatis et al, 1982 

31 Eschenfeldt et al, 1987). The optimum length for the dC tail is 12-20 residues. 

32 The tailing reaction (conditions as described by the manufacturers) was tested 
33 
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1 with a 1.5 kb blunt-ended restriction fragment, taking samples at intervals, and 

2 monitoring the incorporation of a small amount of [ 32 P]-dCTP. A sample of 

3 cDNA (70 ng) was then tailed using the predetermined conditions. 
4 

5 A dG-tailed plasmid vector (3'-oligo(dG)-tailed pUC9) was purchased from 

6 Pharmacia. 15 ng vector was annealed with 0.5 - 5 ng of cDNA at 58°C for 2 

7 hours in annealing buffer: 5mM^Tris-HCl pH 7.6; ImM EDTA, 75 mM NaCl 

8 in a total volume of 50 pi. The annealed mixture was transformed into E. coli 

9 RRI (Bethesda Research Laboratories), transformants being selected on L-agar 

10 + 100 /xg/ml ampicillin. Approximately 200 transformants per ng of cDNA 

11 were obtained. Transformants were stored by growing in 100 pi L-broth in the 

12 wells of microtitre plates, adding 100 pi 80% glycerol, and storing at -20°C. 
13 

14 Some of the dC tailed cDNA was size selected by electrophoresing on a 0.8% 

15 agarose gel, cutting slits in the gel at positions corresponding to 0.5, 1.0 and 

16 1.5 kb, inserting DE81 paper and continuing electrophoresis until the cDNA 

17 had run onto the DE81 paper. The DNA was then eluted from the paper with 

18 high salt buffer, according to the method of Dretzen et al (1981). 
19 

20 Exgmplg g 
21 

22 Construction of Oligonucleotide Probes for the 21 kD Gene 
23 

24 The N-terminus of the 21 kD polypeptide, as determined in Example 2 above, 

25 was 

26 Ala-Asn-Ser-Pr o-Leu-Asp-Thr-Asp-Gly-Asp-Glu . 

27 

28 From this the optimum region for synthesising a probe of 17 residues was as 

29 follows: 
30 

31 
32 
33 
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4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 



1 Asp-Thr-Asp-Gly-Asp-Glu 

2 5' GAC ACC GAC GGC GAC GA 3' 

T T T T T 

3 A * A 

G G 



The 17-mer probe constructed is shown below the sequence: it is actually a 
mixture of 128 different 17-mers, one of which must be the actual coding 
sequence. Probe synthesis was carried out using an Applied Biosystems 
apparatus. 

The 21 kD probe was purified by electrophoresis on a 20% acrylamide gel, the 
bands being detected by UV shadowing, and eluted by dialysing against water. 

Example 9 

Use of Oligonucleotides to Probe cDNA Library 

The oligonucleotide probes were 5' end-labelled with gamma-[ 32 P] dATP and 
the enzyme polynucleotide kinase (Amersham International). The method was 
essentially that of Woods (1982, 1984), except that a smaller amount of isotope 
(15 MCi) was used to label about 40 ng probe, in 10 mM MgCl^ 100 mM 
Tris-HCl, pH 7.6; 20 mM 2-mercaptoethanol. 

The cDNA library was grown on GeneScreen (New England Nuclear) nylon 
membranes placed on the surface of L-agar + 100 /xg/ml ampicillin plates. (The 
word GeneScreen is a trade mark.) Colonies were transferred from microtitre 
plates to the membranes using a 6 x 8 multi-pronged device, designed to fit into 
the wells of half the microtitre plate. Colonies woe grown overnight at 37°C, 
lysed in sodium hydroxide and bound to membranes as described by Woods 
(1982, 1984). After drying the membranes were washed extensively in 3 x 
SSC/0.1% SDS at 65°C, and hybridised to the labelled probe, using a HYBAID 
apparatus from Hybaid Ltd, PO Box 82, Twickenham, Middlesex. (The word 
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1 HYBAID is a trade mark.) Conditions for hybridisation were as described by 

2 Mason & Williams (1985), a T d being calculated for each oligonucleotide 

3 according to the formula: 
4 



8 Hybridisation was carried out at T d -5°C. Washing was in 6 x SSC, 0. 1 % SDS 

9 initially at room temperature in the HYBAID apparatus, then at the 

10 hybridisation temperature (T d -5°C) for some hours, and finally at T d for 

11 exactly 2 minutes. Membranes were autoradiographed onto FUJI X-ray film, 

12 with intensifying screens at - 70°C. (The word FUJI is a trade mark.) After 24 

13 - 48 hours positive colonies stood out as intense spots against a low background. 
14 

15 Example 10 
16 

17 Analysis of Positive Clones for the 21 kD Polypeptide 
18 

19 Several positive clones were obtained with the 21 kD probe, and most of these 

20 contained an insert of 0.9 kb when digested with Pstl (the original vector Pstl 

21 site is re-created by the dG/dC tailing procedure). The inserts had the same 

22 restriction pattern, and are easily large enough to encode the 23 kD precursor, 

23 and it therefore seemed likely that they represented full-length clones. A map 

24 of the inset is shown in Figure 1 . 



26 The 0.9 kb Pstl fragment was purified away from the vector by agarose gel 

27 electrophoresis onto DE81 paper (Dretzen et al 9 1981), and about 500 ng was 

28 nick-translated using the Amersham nick-translation kit. The resulting probe 

29 was -4 x 10 7 cpm and 10 6 cpm were used for the subsequent probing of the 

30 cDNA library, using the hybridisation method described by Wahl and Berger 

31 (1987). The conditions of 50% formamide and 42°C were used. Several more 

32 incomplete positive clones were obtained, which were useful in subsequent 

33 sequencing. 



5 
6 
7 



T d = 4°C per GC base pair + 2°C per AT base pair. 
At mixed positions the lowest value is taken. 



25 
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1 

2 Example 11 

3 

4 Sequencing the Cloned Inserts 
5 

6 The sequencing strategy was to clone the inserts, and where appropriate 

7 subclones thereof, into the multiple cloning site of the plasmids 

8 pTZ18R/pTZ19R (Pharmacia). These plasmids are based on the better-known 

9 vectors pUC18/19 (Norrander a al, 1983), but contain a single-stranded origin 
of replication from the filamentous phage fl. When superinfected with phages 

11 in the same group, the plasmid is induced to undergo single-stranded 

12 replication, and the single-strands are packaged as phages extruded into the 

13 medium. DNA can be prepared from these 'phages' using established methods 

14 for M13 phages (Miller, 1987), and used for sequencing by the method of 

15 Sanger (1977) using the reverse sequencing primer. The superinfecting phage 

16 used is a derivative of M13 termed M13K07, which replicates poorly and so 

17 does not compete well with the plasmid, and contains a selectable 

18 kanamycin-resistance marker. Detailed methods for preparing single-strands 

19 from the pTZ plasmids and helper phages are supplied by Pharmacia. DNA 

20 sequence was compiled and analysed using the Staden package of programs 

21 (Staden, 1986), on a PRIME 9955 computer. (The word PRIME is a trade 

22 mark.) 
23 

24 Example 12 
25 

26 Features of the 21 kD cDNA, and Deduced Amino-acid Sequence of the 23 kD 

27 Precursor 
28 

29 The DNA sequence of the 21 kD cDNA, and the presumed amino-acid sequence 

30 of the encoded 23 kD precursor is shown in Figure 2. The cDNA is 917 bases, 

31 excluding the 3' poly A tail. The ATG start codon is at position 21, followed 

32 by an open reading frame of 221 codons, ending with a stop codon at position 

33 684. This is followed by a 233-base untranslated region, which is relatively 



WO 91/19800 



PCT/GB91/00913 



17 

1 AT-rich (60%) and has several stop codons in all three frames. There are two 

2 polyadenylation signals (AATAAA) at positions 753 and 887 (Proudfoot and 

3 Brownlee, 1976). At position 99 the sequence corresponding to the 

4 oligonucleotide probe is found, and at 167 the Cla site found experimentally. 
5 

6 The presumed 23 kD precursor polypeptide comprises 221 amino-acids and a 

7 molecular weight of 24003. The mature N-terminus is found at position 27, and 

8 the first 26 residues are highly hydrophobic, characteristic of a signal sequence 

9 recognised by the proteins responsible for translocating newly- synthesised 

10 proteins across membranes in the process of compartmentalisation (Kreil, 1981). 

11 The mature protein has 195 residues and a molecular weight of 21223, in good 

12 agreement with that deduced from polyacrylamide gels. The amino-acid 

13 composition of the mature protein is typical of a soluble protein with 24% 

14 charged residues and about 20% hydrophobic residues. 
15 

16 Homologies Between the 21 kD Protein and Other Known Proteins 
17 

18 Searching the protein identification resource (PIR) databank (National 

19 Biomedical Research Foundation, Washington DC) using the sequence matching 

20 program FASTP (Lipman and Pearson, 1985), showed a high degree of 

21 homology between the 21 kD protein and Kunitz-type protease and a-amylase 

22 inhibitors found in large amounts in the seeds of several species, particularly 

23 legumes and cereals. Examples, shown in Figure 3, include the barley 

24 a-amylase/subtilisin inhibitor, B-ASI (Svendsen et al. 1986), wheat a-amylase/ 

25 subtilisin inhibitor, W-ASI (Maeda, 1986), winged bean (Pscophocarpus 

26 tetragonolobus) chymotrypsin inhibitor, W-CI (Shibata et al. 1988), winged 

27 bean trypsin inhibitor, W-TI (Yamamoto et al. 1983), soybean trypsin inhibitor, 

28 S-TI (Koide and Ikenaka, 1973b), Erythrina latissima trypsin inhibitor, E-TI 

29 (Jouberttf a/. 1985). 
30 

31 All the Kunitz-type inhibitors are of a similar size and align along their entire 

32 length. Thus the 21 kD protein must belong to this general class. 
33 
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1 Example 13 
2 

3 Expression of the 23 kD and 21 kD Polypeptides in E. coti 
4 

5 The DNA encoding the 23 Id) and 21 kD polypeptides (ie. with and without the 

6 hydrophobic signal peptide) was subcloned into the E. coli expression vector, 

7 pJLA502 (Schauder et al, 1987) marketed by Medac GmbH, Postfach 303629, 

8 D-7000, Hamburg 36 (see Figure 4). The vector contains the strong lambda 

9 promoters, P L and P R , and the leader sequence and ribosome binding site of the 

10 very efficiently translated E. coli gene, atpE. It also contains a 

11 temperature-sensitive cl repressor, and so expression is repressed at 30°C and 

12 activated at 42°C. The vector has an Ncol site (containing an ATG codon: 

13 CCATGG) correctly placed with respect to the ribosome binding site, and 

14 foreign coding sequences must be spliced in at this point. The 23 kD coding 

15 sequence does not have an Ncol site at the initial ATG, so one was introduced 

16 by in vitro mutagenesis. 
17 

18 In vitro mutagenesis was carried out using a kit marketed by Amersham 

19 International, which used the method of Eckstein and co-workers (Taylor et al, 

20 1985). After annealing the mutagenic primer to single-stranded DNA the 

21 second strand synthesis incorporates alpha-thio-dCTP in place of dCTP. After 

22 extension and ligation to form closed circles, the plasmid is digested with Neil, 

23 an enzyme which cannot nick DNA containing thio-dC. Thus only the original 

24 strand is nicked, and subsequendy digested with exonuclease m. The original 

25 strand is then resynthesised, primed by the remaining DNA fragments and 

26 complementing the mutated position in the original strand. Plasmids are then 

27 transformed into E. coli and checked by plasmid mini preparations. 
28 

29 An Ncol site was introduced into the 23 kD cDNA in plasmid pMSlOl (in the 

30 vector pTZ19R, so that single-stranded DNA could readily be produced) using 

31 the mutagenic primer: 5' ACTTAACCATGGAGACC 3', to create the plasmid 

32 pMS106. The primer was chosen to avoid extensive hybridisation elsewhere in 

33 the plasmid. 
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1 

2 The 23 kD coding region was cloned into the E. coli expression vector pJLA502 

3 on an Ncol-Ecol fragment (pMS107). The coding region was then cloned back 

4 into pTZ19 on a Xhol (upstream of the Ncol) -EcdRI fragment. This creates a 

5 pTZ-23 kD plasmid (pMS108) which has eliminated the poly G/C region, likely 

6 to disrupt transcription between the T7 promoter in the vector and the coding 

7 region. In vitro transcription, using T7 RNA polymerase, produced abundant 

8 RNA which translated in a wheat germ system to give a 23 kD protein. This 

9 proves that a functional gene, capable of producing a protein of the anticipated 
10 size, is present on the plasmid. 

11 

12 The hydrophobic sequel sequence was deleted from plasmid pMS108 using a 

13 mutagenic primer designed to bind either side of the proposed deletion: 
14 

15 5 1 TGGAGACTGCCATGGCAAACTCTCCTGTG 3 f 

16 

17 The resulting plasmid, pMSlll, had retained an Ncol site at the ATG start, and 

18 the 21 kD coding region was subcloned into pJLA502 on an NcoI-BamHI 

19 fragment (pMSl 13). 
20 

21 The two expression vectors were transformed into E. coli UT580. The 

22 transformed strains were grown in L-broth + ampicillin (100 fig/ml) at 30°C 

23 until log phase (OD 610 = 0.5) and the temperature was then shifted to 42°C and 

24 samples taken at intervals. Samples were dissociated by boiling in SDS loading 

25 buffer, and run on SDS-PAGE gels. The proteins were electroblotted onto 

26 nitrocellulose membranes (Towbin et al, 1979) and Western blotting carried out 

27 using the anti-21 kD antibody prepared in Example 3 above (at 2 pg/ml) and as 

28 a secondary antibody, goat anti-rabbit -IgG conjugated to alkaline phosphatase 

29 (Scott era/, 1988). 
30 

31 For the vector pMS107 the antibody detected specific protein of molecular 

32 weight about 23 kD, but there were also smaller bands, including one at 21 kD 

33 suggesting that E. coli was partially cleaving the hydrophobic signal. The 
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1 largest amount of protein was seen after 18 hours, and was the equivalent of at 

2 least 1-2 mg/1. Controls containing only the vector gave no immuno-detectable 

3 proteins. For the vector pMSl 13 a similar result was obtained, except that only 

4 the 21 kD protein was seen: there was no evidence of higher expression in the 

5 absence of the signal sequence. However transforming the vectors into the 

6 protease-deficient strain CAG629 (Dr C.A. Gross) resulted in a much higher 

7 level of expression in both cases, in the order of 5-10 mg/1. 
8 

9 
10 

11 Example U 
12 

13 Expression of the 21/23 kD Polypeptides in Yeast (Saccharomyces cerevisiae) 
14 

15 Two yeast expression vectors were used, both based on a yeast-£. coli shuttle 

16 vector containing yeast and E. coli origins of replication, and suitable selectable 

17 markers (ampicillin-resistance for E. coli and leucine auxotrophy for yeast). 

18 Both vectors contain the yeast pyruvate kinase (PK) promoter and leader 

19 sequence and have a flmdlll cloning site downstream of the promoter. One 

20 vector, A, is designed for internal expression, and the other, B, for secreted 

21 expression, having a portion of the signal sequence of the yeast mating 

22 alpha-factor downstream of the promoter, with a ffmdin site within it to create 

23 fusion proteins with incoming coding sequences. The vectors are illustrated in 

24 Figure 5. 
25 

26 To use the vectors effectively it is desirable to introduce the foreign coding 

27 region such that for vector A, the region from the Hindm cloning site to the 

28 ATG start is the same as the yeast PK gene, and for vector B, the remainder of 

29 the alpha-factor signal, including the lysine at the cleavage point In practice 

30 this situation was achieved by synthesising two sets of Hindm - Ncol linkers to 

31 breach the gap between the Hindm cloning site in the vector and the Ncol at the 

32 ATG start of the coding sequence. For vector B, when the coding sequence is 

33 to be spliced to the yeast alpha-factor signal, the coding region of the 21 kD 
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1 polypeptide (ie. with the cocoa signal sequence removed) was used. The 

2 constructs are illustrated in Figure 6. For ease of construction of the yeast 

3 vectors, HindUl - Ncol linkers were first cloned into the appropriate pTZ 

4 plasmids, and HindJH - BamUl fragments containing linkers plus coding region 

5 cloned into the yeast vector. 
6 

7 The yeast expression plasmids were transferred into yeast spheroplasts using the 

8 method of Johnston (1988). The transformation host was the LEU" strain 

9 AH22, and transformants were selected on leucine- minus minimal medium. 

10 LEU + transformants were streaked to single colonies, which were grown in 50 

11 ml YEPD medium (Johnston, 1988) at 28°C for testing the extent and 

12 distribution of foreign protein. Cells were harvested from cultures in 

13 preweighed tubes in a bench-top centrifuge, and washed in 10 ml lysis buffer 

14 (200mM Tris, pH 8.1; 10% glycerol). The cell medium was reserved and 

15 concentrated 10-25 x in an AMICON mini concentrator. (The word AMICON 

16 is a trade mark.) The washed cells were weighed and resuspended in lysis buffer 

17 -plus protease inhibitors (ImM phenyl methyl sulphonyl fluoride (PMSF); 1 

18 fig/ml aprotinin; 0.5 /xg/ml leupeptin) at a concentration of 1 g/ml. 1 volume 

19 acid-washed glass-beads was added and the cells broken by vortexing for 8 

20 minutes in total, in 1 minute bursts, with 1 minute intervals on ice. After 

21 checking under the microscope for cell breakage, the mixture was centrifuged at 

22 7000 rpm for 3 minutes to pellet the glass beads. The supernatant was removed 

23 to a pre-chilled centrifuge tube, and centrifuged for 1 hour at 20,000 rpm. 

24 (Small samples can be centrifuged in a microcentrifuge in the cold.) The 

25 supernatant constitutes the soluble fraction. The pellet was resuspended in 1 ml 

26 lysis buffer plus 10% SDS and 1% mercaptoethanol and heated at 90°C for 10 

27 minutes. After centrifuging for 15 minutes in a microcentrifuge the supernatant 

28 constitutes the particulate fraction. 
29 

30 Samples of each fraction and the concentrated medium were examined by 

31 Western blotting. Plasmid pMS116, designed for internal expression, produced 

32 both 23 kD and 21 kD polypeptides in the soluble fraction of the cell lysate, and 

33 in the medium considerable amounts (2-5 mg/1) of the 21 kD polypeptide. Thus 
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1 the yeast is recognising the cocoa signal sequence and transporting the protein 

2 across the membrane, cleaving the signal during the process. The cleavage site 

3 appears to be correct, judging by the size of the final protein. 
4 

5 Plasmid pMS117, designed for secreted expression, gave a rather similar result 

6 with rather more 21 kD polypeptide in the medium. No evidence of the 

7 uncleaved polypeptide with the yeast alpha-factor signal still attached was 

8 found, either in the soluble or particulate fraction. 
9 

10 

11 Example 15 
12 

13 Scale-up of Production of the 21 kD Protein ina5L Fermenter 
14 

15 To assess the productivity of the 21 kD protein from yeast AH22 containing the 

16 plasmid pMS117 under scale-up conditions the strain was grown in a 5L 

17 bioreactor manufactured by Life Technologies Inc. Like the small-scale growth 

18 experiments the medium used was YEPD, and the inoculum was 10 ml of a late 

19 log phase culture (OD 600 4.0). The aeration rate was 2L/min and the stirring 

20 speed 350 rpm, and to control the foaming caused by these aeration and stirring 

21 speeds 10 ml safflower oil was added. The cells were just entering log phase 

22 after 10 hours and by 15 hours the log phase was over with the disappearance of 

23 the glucose and accumulation of ethanol. However growth continued until the 

24 harvesting point at 60 hours, with the concomitant oxidation of the ethanol. 

25 The final biomass was 28 g/L wet weight, 7.3 g/L dry weight. Western 

26 blotting of the medium showed that 21 kD protein was exported to the medium 

27 slowly at first, but accumulated rapidly in late stationary phase rising to 

28 approximately 20-30 mg/L at the time of harvesting. 
29 

30 At the end of the experiment yeast cells were removed from the medium by 

31 cross-flow filtration through a 0.2 iim membrane, and the protein (or 

32 macromolecular) constituents in the medium were concentrated by cross-flow 

33 filtration through an ultra filtration membrane with a molecular weight cut-off 
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1 of 10 kD. The crossflow filtration apparatus was manufactured by Sartorius 

2 GmbH, Goettingen, Germany. The 21 kD protein can be further crudely 

3 purified by precipitation with 80% ammonium sulphate, followed by 

4 redissolving in water and dialysis. 
5 

6 Some enhancement of the yield was obtained by a batch feed process whereby 

7 the glucose levels were topped up to 2% from a concentrated solution as soon as 

8 the glucose levels had dropped below 0.1 % . Four such additions were made at 

9 16, 23, 34 and 37 hours, and growth continued until 58 hours. Improved yields 

10 of the 21 kD protein were obtained, up to 50 mg/L by the end of the 

11 experiment. 
12 

13 Example 16 
14 

15 Expression of the 23 kD/21 kD Protein in Hansenula pofymorpha 
16 

17 The methylotrophic yeast Hansenula pofymorpha offers a number of advantages 

18 over Saccharamyces cerevisiae as a host for the expression of heterologous 

19 proteins (EP-A-0173378 and Sudbery et a/, 1988). The yeast will grow on 

20 methanol as sole carbon source, and under these conditions the enzyme 

21 methanol oxidase (MOX) can represent up to 40% of the total cell protein. 

22 Thus the MOX promoter is a very powerful one that can be used in a vector to 

23 drive the synthesis of heterologous proteins, and it is effective even as a single 

24 copy. This gives the potential to use stable integrated vectors. Hansenula can 

25 also grow on rich carbon sources such as glucose, in which case the MOX 

26 promoter is completely repressed. This means that cells containing the 

27 heterologous gene can be grown to a high density on glucose, and induced to 

28 produce the foreign protein by allowing the glucose to run out and adding 

29 methanol. 
30 

31 Constructs (pMYlO and pMY9) containing a 21 kD or 23 kD gene sandwiched 

32 between a MOX promoter and MOX terminator were made in the yeast 

33 episomal plasmid YEpl3. Both contained a yeast secretion signal from 
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1 invertase spliced to the cocoa gene coding region, as illustrated in Figure 7. 

2 These constructs were transformed into Hansenula and both secreted the 21/23 

3 kD protein into the medium under inducing conditions, although pMYlO, 

4 containing the yeast signal but not the plant signal, was the most effective. 
5 

6 The Hansenula construct pMYlO has also been grown under scale-up conditions 

7 in a ferm enter, and biomass yields of 45 g/L dry weight were obtained after 

8 induction with methanol. After induction the 21 kD protein was found in the 

9 medium in increasing amounts up to SO mg/L. 
10 

11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
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1 CLAIMS 
2 

3 1. A 23kD protein of Th. cacao, or a fragment thereof. 
4 

5 2. A 21kD protein of Th. cacao, or a fragment thereof. 
6 

7 3. A protein as claimed in claim 1 or 2 having at least part of the sequence 

8 shown in Figure 2. 
9 

10 4. A fragment as claimed in claim 1, 2, or 3 which comprises at least four 

11 amino acids. 
12 

13 5. A protein or fragment as claimed in any one of claims 1 to 4 which is 

14 recombinant. 
15 

16 6. Recombinant or isolated nucleic acid coding for a protein or fragment as 

17 claimed in any one of claims 1 to 5. 
18 

19 7. Nucleic acid as claimed in claim 6 which is DNA. 
20 

21 8. Nucleic acid as claimed in claim 7 having at least part of the sequence 

22 shown in Figure 2. 
23 

24 9. Nucleic acid as claimed in claim 6, 7 or 8, which is in the form of a 

25 vector. 
26 

27 10. Nucleic acid as claimed in claim 9, wherein the vector is an expression 

28 vector and the protein- or fragment-coding sequence is operably linked to a 

29 promoter. 

30 11. Nucleic acid as claimed in claim 10, wherein the expression vector is a 

31 yeast expression vector and the promoter is a yeast pyruvate kinase (PK) 

32 promoter. 
33 
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31 

1 12. Nucleic acid as claimed in claim 10, wherein the expression vector is a 

2 bacterial expression vector and the promoter is a strong lambda promoter. 

3 

4 13. Nucleic acid as claimed in claim 10, 11 or 12, comprising a signal 

5 sequence. 
6 

7 14. A host cell comprising nucleic acid as claimed in any one of claims 9 to 

8 13. 
9 

10 15. A host cell as claimed in claim 14 which is Saccharomyces cerevisiae. 
11 

12 16. A host cell as claimed in claim 14 which is E. coli. 
13 

14 17. A process for the preparation of a protein or fragment as claimed in any 

15 one of claims 1 to 4, the process comprising coupling successive amino acids by 

16 peptide bond formation. 
17 

18 18. A process for the preparation of a protein or fragment as claimed in any 

19 one of claims 1 to 4, the process comprising culturing a host cell as claimed in 

20 claim 14, 15 or 16. 
21 

22 19. A process for the preparation of nucleic acid as claimed in any one of 

23 claims 6 to 13, the process comprising coupling together successive nucleotides 

24 and/or ligating oligo- or poly-nucleotides. 
25 

26 
27 
28 
29 
30 
31 
32 
33 
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